Multi-Layered Learning System for Real Robot Behavior Acquisition

نویسندگان

  • Yasutake Takahashi
  • Minoru Asada
چکیده

This paper presents a series of the studies of multilayered learning system for vision-based behavior acquisition of a real mobile robot. The work of this system aims at building an autonomous robot which is able to develop its knowledge and behaviors from low level to higher one through the interaction with the environment in its life. The system creates leaning modules with small limited resources, acquires purposive behaviors with compact state spaces, and abstracts states and actions with the learned modules. To show the validity of the proposed methods, we apply them to simple soccer situations in the context of RoboCup (Asada et al. 1999) with real robots, and show the experimental results. Introduction One of the main concern about autonomous robots is how to implement a system with learning capability to acquire both varieties of knowledge and behaviors through the interaction between the robot and the environment during its life time. There have been a lot of work on different learning approaches for robots to acquire behaviors based on the methods such as reinforcement learning, genetic algorithms, and so on. Especially, reinforcement learning has recently been receiving increased attention as a method for behavior learning with little or no a priori knowledge and higher capability of reactive and adaptive behaviors. However, simple and straightforward application of reinforcement learning methods to real robot tasks is considerably difficult due to its almost endless exploration of which time is easily scaled up exponentially with the size of the state/action spaces, that seems almost impossible from a practical viewpoint. One of the potential solutions might be application of socalled “mixture of experts” proposed by Jacobs and Jordan (Jacobs et al. 1991), in which a set of expert modules learn and one gating system weights the output of the each expert module for the final system output. This idea is very general and has a wide range of applications. However, we have to consider the following two issues to apply it to the real robot tasks: Copyright c © 2004, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. • Task decomposition: how to find a set of simple behaviors and assign each of them to a learning module or an expert in order to achieve the given initial task. Usually, human designer carefully decompose the long time-scale task into a sequence of simple behaviors such that the one short time-scale subtask can be accomplished by one learning module. • Abstraction of state and/or action spaces for scaling up: the original “mixture of experts” consists of experts and and gate for expert selection. Therefore, no more abstraction beyond the gating module. In order to cope with complicated real robot tasks, abstraction of the state and/or action spaces is necessary. Connell and Mahadevan (Connell & Mahadevan 1993) decomposed the whole behavior into sub-behaviors each of which can be independently learned. Morimoto and Doya (Morimoto & Doya 1998) applied a hierarchical reinforcement learning method by which an appropriate sequence of subgoals for the task is learned in the upper level while behaviors to achieve the subgoals are acquired in the lower level. Hasegawa and Fukuda (Hasegawa & Fukuda 1999; Hasegawa, Tanahashi, & Fukuda 2001) proposed a hierarchical behavior controller, which consists of three types of modules, behavior coordinator, behavior controller and feedback controller, and applied it to a brachiation robot. Kleiner et al. (Kleiner, Dietl, & Nebel 2002) proposed a hierarchical learning system in which the modules at lower layer acquires low level skills and the module at higher layer coordinates them. However, in these proposed methods, the task decomposition has been done by the designers very carefully in advance, or the constructions of the state/action spaces for higher layer modules are independent from the learned behaviors of lower modules. As a result, it seems difficult to abstract situations and behaviors based on the already acquired learning/control modules. A basic idea to cope with the above two issues is that any learning module has limited resource constraint, and this constraint of the learning capability leads us to introduce a multi-module and multi-layered learning system; one learning module has a compact state-action space and acquires a simple map from the states to the actions, and a gating system enables the robot to select one of the behavior modules depending on the situation. More generally, the higher module controls the lower modules depending on the situation. The definition of this situation depends on the capability of the lower modules because the gating module selects one of the lower modules based on their acquired behaviors. From the another viewpoint, the lower modules provide not only the rational behaviors but also the abstracted situations for the higher module; how feasible the module is, how close to its subgoal, and so on. It is reasonable to utilize such information in order to construct state/action spaces of higher modules from already abstracted situations and behaviors of lower ones. Thus, the hierarchical structure can be constructed with not only experts and gating module but more layers with multiple homogeneous learning modules. In this paper, we show a series of studies towards the construction of such hierarchical learning structure developmentally. The first one (Takahashi & Asada 2000) is automatic construction of hierarchical structure with purely homogeneous learning modules. Since the resource (and therefore the capability, too) of one learning module is limited, the initially given task is automatically decomposed into a set of small subtasks each of which corresponds to one of the small learning modules, and also the upper layer is recursively generated to cover the whole task. In this case, the all learning modules in the one layer share the same state and action spaces although some modules need the part of them. Then, the second work (Takahashi & Asada 2001) and third one (Takahashi & Asada 2003) focused on the state and action space decomposition according to the subtasks to make the learning much more efficient. Further, the forth one (Takahashi, Hikita, & Asada 2003) realized unsupervised decomposition of a long time-scale task by finding the compact state spaces, which consequently leads the subtask decomposition. We have applied these methods to simple soccer situations in the context of RoboCup (Asada et al. 1999) with real robots, and show the experimental results. Multi-Layered Learning System The architecture of the multi-layered reinforcement learning system is shown in Figure 1, in which (a) and (b) indicate a hierarchical architecture with two levels and an individual learning module embedded in the layers, respectively. Each module has its own goal state in its state space, and it learns the behavior to reach the goal, or maximize the sum of the discounted reward received over time based on the Q-learning method. The state and the action are constructed using sensory information and motor command, respectively at the bottom level. The input and output to/from the higher level are the goal state activation and the behavior activation, respectively, as shown in Figure 1(b). The goal state activation g is a normalized state value 1, and g = 1 when the situation is the goal state. When the module receives the behavior activation from the higher modules, it calculates the optimal policy for its own goal, and sends action commands to the lower module. The action command at the bottom level is translated to an actual motor command, then The state value function estimates the sum of the discounted reward received over time when the robot takes the optimal policy, and is obtained by Q learning. Module Learning

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modular Learning Systems for Soccer Robot

This paper presents a series of the studies of modular learning system for vision-based behavior acquisition of a soccer robot participating in middle size league of RoboCup (Asada, et al. 1999). Reinforcement learning has recently been receiving increased attention as a method for behavior learning with little or no a priori knowledge and higher capability of reactive and adaptive behaviors. H...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

Behavior Acquisition in RoboCup Middle Size League Domain

The RoboCup middle size league is one of the leagues that have the longest histories in RoboCup. This league has unique features, for example, bigger robots (around 45cm square) plays on the largest field (say, 18m×12m in 2007), any global sensory system is not allowed to use, all robots have on-board vision systems and controllers. Each robot plays based on its own sensory information, and it ...

متن کامل

Vision-guided behavior acquisition of a mobile robot by multi-layered reinforcement learning

This paper proposes multi-layered reinforcement learning by which the control structure can be decomposed into smaller transportable chunks and therefore previously learned knowledge can be applied to related tasks in a newly encountered situation. The modules in the lower networks are organized as experts to move into different categories of sensor output regions and to learn lower level behav...

متن کامل

Performance Analysis of an Industrial Robot Under Uniform Temperature Change

The effect of temperature change on dynamic performances of an industrial robot with six axes of freedom is studied in this paper. In general, the strain and stress are produced not only by the external exciting force, but also by temperature change. The strain energy that is caused by temperatureThis paper describes how the temperature variation effects the dynamic performance of an indu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004